Bayesian Blocks is a non-parametric analysis method for sequential data whose goal is to represent a signal with an optimal set of data blocks that allows to deal with measurements errors and noise while highlighting its important features and local structure. It is built in such a way that it imposes as few preconditions and assumptions as possible in the signal shape, scale and resolution. The algorithm is born in 1998 and improved in 2013 by Scargle in the context of astronomical time series analysis.
The objective of this analysis is to divide the range of the independent variable into subintervals, called blocks, in which the dependent variable (e.g. counts of events, amplitude of a signal ...) can be modeled as constant. The algorithm aims at providing the best partition of the independent variable by maximizing a certain goodness-of-fit measure that is called fitness. Using a fitness function that is block-additive, meaning that the total fitness of the partition is the sum of the fitness values of all the blocks, allows to treat the blocks independently, in the sense that a block’s fitness depends only on its own data. The block-additivity property can be expressed with the following equation:
where $F[P(T)]$ is the total fitness of the partition $P$ of interval $T$ and $f(B_k)$ is the fitness of block k, a measure of how well a constant signal represents the data within it.
The time at which the signal presents an abrupt transition in its amplitude is called change point. The algorithm is meant to detect these change points and use them as edges of data blocks. As a non-parametric analysis technique, finding the optimal partition involves controlling in some way the complexity of the estimated representation, namely the number of blocks. In a usual application the number of blocks is expected to be much smaller than the number of data analysed. It is possible to influence the number of blocks by defining a prior distribution. For example it can be used a geometric prior with the single parameter $\gamma$:
where $P_{0}$ is the desired 'correct detection rate'. In the usual fashion for Bayesian model selection, in case of high signal-to-noise ratios, $N_{blocks}$ is determined by the structure of the signal, while, with lower signal-to-noise, the prior becomes more and more important.
It is convinient to represent the input data of the algorithm with data cells. Data cells are defined with two fundamental features: cell width and its content. In this context a block is any set of consecutive cells and a partition is simply a collections of non-overlapping blocks.
The number of possible partitions is $2^{N-1}$, with $N$ representing the number of data cells, so an exhaustive search for the optimal partition is impossible for any real dataset. Hence, the algorithm follows an iterative procedure similar to mathematical induction: beginning with the first data cell, a new cell is added at each step and the best partition is selected making use of the results of all the previous steps. Indeed, a key concept of the proposed algorithm is expressed in the theorem below:
Theorem: Removing the last block of an optimal partition leaves an optimal partition
This fact allows to reduce the computational cost of the algorithm to $O(N^2)$.
The description of the core of the algorithm follows:
Let $P_{opt}(R)$ denote the optimal partition of the first $R$ cells. In the starting case $R = 1$, the only possible partition is trivially optimal.
At step $R+1$ the results of the previous $R$ steps are stored in the arrays best, containing the fitness of the old optimal partitions, and last, containing the position of the last change point at each iteration.
To obtain the optimal partition $P_{opt}(R+1)$, consider the set of all partitions that have the last block starting with the cell $k$ and ending at cell $R+1$.
By making use of the additivity of the fitness function and the theorem reported above, it is possible to compute the overall fitness of the partition $P_{opt}(R+1)$ by summing the fitness of this last block, $f(k)$, with the fitness of the optimal partition obtained in the previous steps:
where $A(k)$ contains the fitness of all the partitions $P(R+1)$ that can possibly be optimal. The optimal one is easily found by maximizing $A(k)$. The maximum value is stored in the array best, while the value of k, index of the maximum, is stored in last.
At the end of the iterations, when all the data cells has been used, it is necessary to retrieve the change points locations by using the information contained in last. Indeed, by using the last value in this array and removing the section corresponding to the last block repeatedly, the change points can be found in reversed order. In symbols, denoting the number of change points with $Ncp$:
The algorithm implementation receives as input the following parameters:
[Lines: 26 - 65] The first part of the function implements some checks on consistency between data_mode value and input data. In particular:
times has been provided and that data contained in it are sorted and then, in case it finds repeated values, it puts them together and sums up their weights. weights has been passed and modifies the values that are equal to $0$ by dividing all the weights by their minimum (different from $0$) and adding a small offset ($10^{-4}$) to the zero values. We need to remove zero values because they would rise an error when computing the logarithm in the fitness function and we can do this because, according to [1], the signal amplitude can be treated as a 'nuisance parameter' and we are returning only the change points. weights and eventually initializes the missing optional arrays. [Lines: 72 - 99] This part defines the data cells edges and the Fitness and Prior functions. data_mode=1 and data_mode=2 share the same Fitness and Prior implementations, while data_mode=3 has its own Fitness and Prior.
Note that one can also use a flat prior by passing to the algorithm gamma = 1.
[Lines: 102 - 130] Here the algorithm enters the loop over data cells, where the fitness is computed according to the selected data mode and the prior contribution is summed to it. Note that the object fitness_k is an array that contains the fitness values of all the possible last blocks at iteration k. Then, this array is summed with the best optimal partitions obtained in the previous steps up to k-1. It is now possible to get the new optimal partition by selecting the maximum value and its index from the array A_k and store them into the arrays best and last, respectively.
[Lines: 132 - 140] Finally, it is only needed to retrieve the change points by iteratively peeling off the last block from the array last.
Bayesian_Blocks <- function(data_mode=1, times=0, weights=0, sigmas=1, gamma=0.01, p1=0.01) {
# Bayesian Blocks algorithm
#
# A nonparametric modeling technique that finds the optimal segmentation of the data in the
# observation interval. This function returns a list of optimal change points of a
# one-dimensional time series or sequential data.
# Implementation based on [1^].
# --------------------------------------------------------------------------------------------
# Parameters:
# - data_mode: '1' for Event data, '2' for Binned data, '3' for Points Measurements with
# known error distribution (numbering chosen to be consistent with [1^])
# - times : array containing time-tags (or in general the independent variable)
# - weights : array containing counts (data_mode=2) or measures (data_mode=3)
# - sigmas : array containing errors of measures (data_mode=3)
# - gamma : float used to compute Prior as: log(gamma). (ignored if 'p1' is provided
# or 'data_mode'=3)
# - p1 : float used to compute calibrated Prior as reported in [2^]
# --------------------------------------------------------------------------------------------
#
# [^1]: J. D. Scargle et al., Astrophys. J. 764 (2013) 167, URL:
# https://iopscience.iop.org/article/10.1088/0004-637X/764/2/167
# [^2]: J. D. Scargle et al., *The Bayesian Block Algorithm*, 2013, URL:
# https://arxiv.org/abs/1304.2818
### selecting data_mode ###
if (data_mode == 1) { # Event data
if (missing(times)) stop("with data_mode = 1, 'times' must be specified")
if (!missing(weights)) cat("passed weights will be ignored...",
" If they are meaningful please use data_mode = 2")
# sorting and dealing with repeated values
table <- rle(sort(times))
times <- table$values
weights <- table$lengths
}
else if (data_mode == 2) { # Binned data
if (missing(weights)) stop("with data_mode = 2, 'weights' must be specified")
if (missing(times )) { times <- c(1:length(weights)) }
# deal with zero entries that could give error with fitness computation
if (sum(weights)!=0) {
step <- min(weights[weights!=0])
weights <- weights/step # normalize weights
}
weights[weights==0] <- 1e-4
#times <- times[weights!=0]
#weights <- weights[weights!=0]
}
else if (data_mode == 3) { # Points Measurements
if (missing(weights)) stop("with data_mode = 3, 'weights' must be specified")
if (missing(times )) { times <- c(1:length(weights)) }
if (missing(sigmas)) { sigmas <- c(rep(1, length(weights))) }
# standardization of data
#weights <- (weights - mean(weights))/sd(weights)
}
N <- length(times)
# compute data cells edges
cells_edges <- c(times[1], 0.5*(head(times,-1)+tail(times,-1)), tail(times,1))
### Prior and Fitness function ###
if (data_mode==3) { # Points Measurements
# defining prior
if (missing(gamma)) {
#reported at the end of section 3.3 in Scargle(2013)
prior <- -(1.32 + 0.577 *log10(N))
}
else {
prior <- log(gamma)
}
# defining fitness
fitness <- function(b_k, a_k) { return (b_k**2 / (4*a_k)) }
}
else { # Event data | Binned data
# defining prior
if (missing(p1)) {
prior <- (log(gamma))
}
else {
#equivalent to the one reported in Scargle(2013): ncp_prior = log(73.53*p1*N**(-0.478))-4
#taken from code linked in Scargle(2013)
prior <- (log(p1 /(0.0136*N**(0.478))) - 4)
}
# defining fitness
fitness <- function(N_k, T_k) { return (N_k*(log(N_k) - log(T_k))) }
}
### iterate over data cells: ### -------------------------------------------------------------------
best <- numeric(length(times))
last <- numeric(length(times))
for (k in 1:N) {
if (data_mode==3) {
b_k <- rev(cumsum( rev(-weights[1:k]/(sigmas[1:k]*sigmas[1:k])) )) #sum(x_n / s_n^2)
a_k <- rev(cumsum( rev(0.5/(sigmas[1:k]*sigmas[1:k])) )) #sum(1 / 2*s_n^2)
## compute fitness of all possible last blocks obtained by adding k-th data cell
fitness_k <- fitness(b_k, a_k) + prior
}
else {
N_k <- rev(cumsum( rev(weights[1:k]) )) # counts of every possible last block
T_k <- cells_edges[k+1] - cells_edges[1:k] # all the possible lengths of last block
## compute fitness of all possible last blocks obtained by adding k-th data cell
fitness_k <- fitness(N_k, T_k) + prior
}
## compute all possible partitions
A_k <- fitness_k + c(0, best[1:k-1])
## store best overall fitness (best A(k)) and last change point
best[k] <- max(A_k)
last[k] <- which.max(A_k)
}
### end iterations ### -----------------------------------------------------------------------------
# retrieve change points positions from "last" vector
change_points <- c()
icp <- last[length(times)]
while (icp > 1) {
change_points <- c(icp, change_points)
icp <- last[icp-1]
}
return (cells_edges[change_points])
}
ApplyChangePointsToHist <- function(cpts, data, times=0) {
#It computes the left bin edges and counts of 'data' using change points in 'cpts'
if (missing(times)) { times <- c(1:length(data)) }
edges <- c(times[1], cpts, tail(times, 1))
#bin_centers <- 0.5*(tail(edges, -1) + head(edges, -1))
cells_edges <- c(times[1], 0.5*(head(times,-1)+tail(times,-1)), tail(times,1))
width <- diff(cells_edges)
counts <- numeric(length(cpts)+1)
for (jdx in 1:length(counts)) {
Wcells <- 0 # denominator for mean computation
for (idx in 1:length(data)) {
if ((times[idx] < edges[jdx+1]) & (times[idx] >= edges[jdx])) {
counts[jdx] <- counts[jdx] + data[idx]*width[idx]
Wcells <- Wcells + width[idx]
}
}
counts[jdx] <- counts[jdx]/Wcells
}
return (list(bins=head(edges,-1), counts=counts))
}
Almost any kind of physical variable and any measurement framework can be accomodated to be processed with this algorithm. With the implementation above we are ready to test the algorithm under the different assumptions made in [1].
The three main examples stressed in [1] are:
This kind of data are the most raw ones and can be represented by series of times of discrete events.
An example can be the cosmic rays detected with a scintillator and registered as a sequence of arrival times.
The fitness function for a block of these data is the log likelihood:
where:
From simulations of signal-free observational noise, the results of extensive simulations for a range of values of N and the adopted correct detection rate $p_0$ were found in [2] to be well fit with the prior:
where:
We decided to test our algorithm for this kind of data with a dataset that contains informations about earthquakes in California in the years 1982-2011. We run our algorithm over the dataset filtered for uncorrelated events and try to see if the algorithm is able to detect efficiently changes in the event frequency.
#read the dataset
data <- read.table("SouthCalifornia-1982-2011_Physics-of-Data.dat", header=FALSE,
col.names=c('index','trigger','time','magnitudes','X','Y','Z'))
head(data)
#filter by uncorrelated events
timetags <- data$time[data$trigger==-1]
cat("Length of the dataset:", length(timetags))
#apply the algorithm
cpts <- Bayesian_Blocks(data_mode=1, times=timetags, p1=0.01)
cat("Number of Change Points:", length(cpts))
#plot the results
options(repr.plot.width=12, repr.plot.height=8)
# data rebinned with Bayesian Block algorithm
edges <- c(timetags[1], cpts, tail(timetags, 1))
h1 <- hist(timetags, breaks=length(edges)*12, plot=FALSE)
h2 <- hist(timetags, breaks=edges, plot=FALSE)
#plot the results
plot(head(h1$breaks,-1), h1$density, col="blue", type="s", main="Earthquakes frequency distribution",
xlab='Time [s]',ylab='Density',ylim=c(0,max(h2$density)))
lines(head(h2$breaks,-1), h2$density, col="red", type="s", lwd=1.5)
legend("topright", inset=c(-0.1,0), legend=c("Evenly spaced bins", "Bayesian Blocks"), col=c("blue","red"),
lwd=c(2,2), lty=c(1,1), border=FALSE, box.lty=0)
grid()
box()
#standard deviation of frequencies computed in B.B. bins
freqs <- c()
for (i in 1:(length(cpts)-1)){
a <- data[data$time>=cpts[i] & data$time<cpts[i+1] , ]
freqs <- c(freqs,length(unlist(a['time']))/(cpts[i+1]-cpts[i]))
}
cat("Standard deviation of frequency from Bayesian Blocks:", sd(freqs))
#standard deviation of frequencies computed in evenly spaced bins
step <- max(data$time)/length(cpts)
freqs <- c()
for (i in 1:length(cpts)){
a <- data[data$time>=(i-1)*step & data$time<i*step , ]
freqs <- c(freqs,length(unlist(a['time']))/(step))
}
cat("Standard deviation of frequency from evenly spaced bins:", sd(freqs))
This data are similar to the ones presented above, but with the events collected into bins, which do not have to be equal or evenly spaced. In other words, in this case we are already working with histograms.
Constructing a histogram of non-sequential measured values is very similar to the estimation of a piecewise constant model for the same data treated as if they were sequential. Hence, histograms can be constructed by simply ordering the measured values and applying our algorithm for event data.
The Likelihood for bin $n$ is given by the Poisson distribution:
where:
We test our algorithm with data collected using a germanium detector from a source of $Am^{241}$, $Cs^{137}$ and $Co^{60}$
#read the dataset
data <- scan("B19036_AmCsCo_20180316.dat", skip=2)
#apply the algorithm
cpts <- Bayesian_Blocks(data_mode=2, weights=data, p1=0.01)
cat("Found ", length(cpts), " change points.")
#compute the counts
hs <- ApplyChangePointsToHist(cpts=cpts, data=data)
#plot the results
par(mfrow=c(1,1))
options(repr.plot.width=12, repr.plot.height=8) #oppure 10,7
# original data
tts <- 1:length(data)
Norm <- sum(data+1)
plot(tts, (data+1)/Norm, type="s", col=rgb(0,0,1,alpha=0.6), log='y',
xlab="Energy [ADC channel]", ylab="Density", main="Am-Cs-Co spectra with B.B. algorithm")
polygon(tts, (data+1)/Norm, col = rgb(0,0,1,alpha=0.1), border=FALSE)
# data rebinned with Bayesian Block algorithm
lines(hs$bins, (hs$counts+1)/Norm, type="s", col=rgb(1,0,0,alpha=1), lwd=1.5)
legend("topright", inset=c(-0.1,0), legend=c("Original data", "Bayesian Blocks"), col=c("blue","red"),
lwd=c(2,2), lty=c(1,1), border=FALSE, box.lty=0)
grid()
box()
# kernel density estimation
tts <- 1:length(data)
Norm <- sum(data+1)
KDE <- density(tts, weights=(data+1)/Norm, bw=8)
options(repr.plot.width=12, repr.plot.height=8)
plot(tts, (data+1)/Norm, type="s", col=rgb(0,0,1,alpha=0.6), log='y',
xlab="Energy [ADC channel]", ylab="Density", main="Am-Cs-Co spectra with KDE")
polygon(tts, (data+1)/Norm, col = rgb(0,0,1,alpha=0.1), border=FALSE)
lines(KDE, type="s", col="red", lwd=2)
legend("topright", inset=c(-0.1,0), legend=c("Original data", "Kernel Density Est."), col=c("blue","red"),
lwd=c(2,2), lty=c(1,1), border=FALSE, box.lty=0)
grid()
box()
# calibration
kevs <- c(59.54, 661.66, 1173.24, 1332.51) # energy values in keV
ADCs <- c(which.max(data[1:1000]), which.max(data[1000:2000])+1000,
which.max(data[2500:3200])+2500, which.max(data[3201:4000])+3201)
calib <- lm(kevs ~ I(ADCs))
xx <- c(seq(min(ADCs), max(ADCs), len=500))
yy <- calib$coefficients[1] + calib$coefficients[2]*xx
# plot
options(repr.plot.width=12, repr.plot.height=8)
plot(ADCs, kevs, pch=19, col="blue", main="Spectrum Calibration",
xlab="Energy [ADC channel]", ylab="Energy [keV]")
lines(xx, yy, type="l", col="red")
label <- paste0("f(x) = a*x + b\n a : ", formatC(calib$coefficients[[2]], width=5),
"\n b : ", formatC(calib$coefficients[[1]], width=5))
text(500, 1000, labels=label, pos=4, cex=1.2, xpd=TRUE)
grid()
# residuals plot
par(new=TRUE, omi=c(0.8,0,0,0.5))
layout(matrix(1:4,2))
plot(ADCs, calib$residuals, type="p", col="blue", pch=18, cex=1.2, main="Residuals",
xlab="", ylab="")
lines(ADCs, c(rep(0,length(ADCs))), type="l", lty=2, col="red")
segments(x0=ADCs,y0=rep(0,length(ADCs)), y1=calib$residuals, col=rgb(0,0,1,alpha=0.5))
To show how the algorithm works we decide to produce an animation. To produce the frames, we run the algorithm in a cycle over the entire dataset passing to it the first i data.
data <- scan("B19036_AmCsCo_20180316.dat", skip=2)
times <- c(1:length(data))
Norm <- sum(data+1)
#calibration
times <- calib$coefficients[1] + calib$coefficients[2]*times
steps <- c(seq(1, 8192, 80), 8192)
# loop
system("mkdir frames")
for (i in steps) {
cps <- Bayesian_Blocks(data_mode=2,times=times[1:i], weights=data[1:i], p1=0.01)
counts <- ApplyChangePointsToHist(cps, data[1:i], times=times[1:i])
name <- paste0("frames/frame_", formatC(i, width=4, flag="0"), ".png")
png(file=name, width=1280, heigh=720)
# original data
plot(times[1:i], (data[1:i]+1)/Norm, type="s", col=rgb(0,0,1,alpha=0.6), log="y",
ylim=c(1,max(data))/Norm, xlim=c(1,max(times)), xlab="Energy [keV]", ylab="Density",
main="Am−Cs−Co spectra with B.B. algorithm")
polygon(times[1:i], c(head(data[1:i]+1,-1), 1)/Norm, col = rgb(0,0,1,alpha=0.1), border=FALSE)
# data rebinned with Bayesian Block algorithm
lines(counts$bins, (counts$counts+1)/Norm, type="s", col=rgb(1,0,0,alpha=1), lwd=2)
x1 <- tail(counts$bins, 1) ; x2 <- times[i]
y1 <- tail(counts$counts+1, 1)/Norm ; y2 <- 1/Norm
polygon(c(x1,x1,x2,x2), c(y2,y1,y1,y2), col=rgb(1,0,0,alpha=0.4), border=FALSE)
legend("topright", legend=c("Original data", "Bayesian Blocks"), col=c("blue","red"),
lwd=c(2,2), lty=c(1,1), box.lty=0) #inset=c(-0.1,0.01)
grid()
box()
dev.off()
}
system("convert -delay 10 frames/*.png BayesianBlocksAnimation.gif")
library("IRdisplay")
display_png(file="BayesianBlocksAnimation.gif")
These data represent the measurements of a physical quantity during time. We want to characterize its time dependence. Inevitable corruption due to observational errors is frequently countered by smoothing the data and/or fitting a model. As with the other data modes Bayesian Blocks is a different approach to this issue, making use of knowledge of the observational error distribution and avoiding the information loss entailed by smoothing.
Considering the case where the errors are taken to obey a normal probability distribution with zero mean and given variance, a theorical treatment gives the fitness function at the first order:
where:
A simulation study reported in [1] to calibrate the prior for normally distributed point measurements depicts the relation:
where $N$ is the number of data points.
set.seed(1230642885)
mexican_hat <- function(t, sigma, pos=0) { # taken from Wikipedia
return ( (2/(sqrt(3*sigma)*pi**(0.25)))*(1 -(t-pos)**2/sigma**2)*exp(-(t-pos)**2 / (2*sigma**2)) )
}
N <- 1000
amplitude <- runif(5,10,100)
sigma <- runif(5,0,100)
position <- runif(5,0,1000)
t <- 1:N
# building signal
signal <- numeric(N)
for (jj in 1:5) {
signal <- signal + amplitude[jj]*mexican_hat(t, sigma[jj], position[jj])
}
noise <- rnorm(N, mean = 0, sd = 1)
simul <- signal + noise
cpts <- Bayesian_Blocks(data_mode=3, times=t, weights=simul, sigmas=c(rep(1,N)))
hs <- ApplyChangePointsToHist(cpts, simul, t)
# plot
options(repr.plot.width=12, repr.plot.height=8)
# original data
plot(t, simul, type="l", col=rgb(0,0,1,alpha=0.6), xlab="Time [a.u.]", ylab="Amplitude [a.u.]",
main="B.B. points measurements simulation")
#polygon(t, simul, col = rgb(0,0,1,alpha=0.1), border=FALSE)
# data rebinned with Bayesian Block algorithm
lines(hs$bins, hs$counts, type="s", col=rgb(1,0,0,alpha=1), lwd=1.5)
lines(t, signal, type="s", col= rgb(0,1,0,alpha=1), lwd=2)
legend("topleft", inset=c(0,0.01), legend=c("Original data", "Bayesian Blocks",'True signal'),
col=c("blue","red",'green'),
lwd=c(2,2), lty=c(1,1), border=FALSE, box.lty=0)
grid()
box()
The number of possible partitions (i.e., the number of ways $N$ cells can be arranged in blocks) is $2^{N-1}$. The algorithm implemented should follow a computational time scaling of order $O(N^2)$, performing implicitly a complete search of this space. Indeed, the algorithm is able to find the global optimum among all partitions without an exhaustive explicit search, which is obviously impossible for almost any value of $N$ arising in practice.
In the following we test the computational time of the algorithm and fit the resulting plot with a quadratic curve.
data <- scan("B19036_AmCsCo_20180316.dat", skip=2)
times <- c(1:length(data))
steps <- c(seq(1, length(data), 500), length(data))
comp_times <- c()
for (i in steps) {
start_time <- Sys.time()
cps <- Bayesian_Blocks(data_mode=2,times=times[1:i], weights=data[1:i], p1=0.01)
end_time <- Sys.time()
comp_times <- c(c(comp_times),as.numeric(end_time - start_time))
}
x <- steps
y <- comp_times
# second order fit
fit <- lm(y ~ I(x)+I(x^2))
xx <- steps
yy <- predict(fit,data.frame(x=xx))
# plot
options(repr.plot.width=12, repr.plot.height=8)
# measures of time
plot(steps, comp_times, type="p", col=rgb(0,0,1,alpha=0.5), xlab="N ", ylab="Time [s]",
main="Computational Time as function of N", pch=18, cex=1.5)
# fitted line
lines(xx,yy, type="l", col="red", lwd=1.5)
legend("right", inset=c(-0.1,0), legend=c("Measurements", "Fitted model"), col=c("blue","red"),
lwd=c(2,2), lty=c(0,1), pch=c(18,NA), border=FALSE, box.lty=0)
label <- paste0("f(x) = a*x² + b*x + c\n a : ", formatC(fit$coefficients[[3]], width=5),
"\n b : ", formatC(fit$coefficients[[2]], width=5),
"\n c : ", formatC(fit$coefficients[[1]], width=5),
"\n chi² : ", formatC(sum(fit$residuals**2/yy), width=5) )
text(6700, 0.5, labels=label, pos=4, cex=1.2, xpd=TRUE)
grid()
box()
# residuals plot
par(new=TRUE, omi=c(0,1,0.8,0))
layout(matrix(4:1,2))
plot(steps, fit$residuals, type="p", col="blue", pch=18, cex=1.2, main="Residuals",
xlab="", ylab="", bg="white")
lines(steps, c(rep(0,length(steps))), type="l", lty=2, col="red")
segments(x0=steps,y0=rep(0,length(steps)), y1=fit$residuals, col=rgb(0,0,1,alpha=0.5))